Modelling timelines to elimination of sleeping sickness in the Democratic Republic of Congo, accounting for possible cryptic human and animal transmission

Background Sleeping sickness (gambiense human African trypanosomiasis, gHAT) is a vector-borne disease targeted for global elimination of transmission (EoT) by 2030. There are, however, unknowns that have the potential to hinder the achievement and measurement of this goal. These include asymptomatic gHAT infections (inclusive of the potential to self-cure or harbour skin-only infections) and whether gHAT infection in animals can contribute to the transmission cycle in humans. Methods Using modelling, we explore how cryptic (undetected) transmission impacts the monitoring of progress towards and the achievement of the EoT goal. We have developed gHAT models that include either asymptomatic or animal transmission, and compare these to a baseline gHAT model without either of these transmission routes, to explore the potential role of cryptic infections on the EoT goal. Each model was independently calibrated to five different health zones in the Democratic Republic of the Congo (DRC) using available historical human case data for 2000–2020 (obtained from the World Health Organization’s HAT Atlas). We applied a novel Bayesian sequential updating approach for the asymptomatic model to enable us to combine statistical information about this type of transmission from each health zone. Results Our results suggest that, when matched to past case data, we estimated similar numbers of new human infections between model variants, although human infections were slightly higher in the models with cryptic infections. We simulated the continuation of screen-confirm-and-treat interventions, and found that forward projections from the animal and asymptomatic transmission models produced lower probabilities of EoT than the baseline model; however, cryptic infections did not prevent EoT from being achieved eventually under this approach. Conclusions This study is the first to simulate an (as-yet-to-be available) screen-and-treat strategy and found that removing a parasitological confirmation step was predicted to have a more noticeable benefit to transmission reduction under the asymptomatic model compared with the others. Our simulations suggest vector control could greatly impact all transmission routes in all models, although this resource-intensive intervention should be carefully prioritised. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-024-06404-4.

• Previously model comparison of more than two Warwick gHAT models was done using weighting from the deviance information criteria (DIC) (e.g.[22,17,24]).Here Bayes factors are used instead, extending the approach used by Crump et al. [10] to compare the baseline and animal models.
• No previous gHAT asymptomatic model fitting (from any group) has assessed whether the asymptomatic model is statistically likely compared to other model variants or quantified the impact of asymptomatic transmission on the predicted dynamics and elimination of transmission.
the health zone level and for each year based on recorded geolocations (where known) or location descriptions.
Cases were classified as active or passive and each was labelled as "stage 1", "stage 2" or "stage unknown".In general, before 2015, staging information is unknown but is present in most data for 2015 and 2016.We also record the total number of people listed as tested by the active screening teams for each health zone for each year.
Figures 4-5 show the extracted and aggregated data for the five health zones of this study that were used for fitting.

Model variants
The compartmental gHAT infection model (Figure 6) and its equations (Eqn 1) have been presented elsewhere previously [9,10,1] for either the baseline model and model with animal transmission or the asymptomatic model, but not all together.Descriptions of model parameters can be found in Tables A and B.

Only in the model with animal transmission
Only in the asymptomatic model active detection of low-risk people (screen-confirm-treat) 1=low-risk 4=high-risk tsetse animals capable of acquiring and transmitting infection Fig 6: Compartmental model schematic using Greek notation for rates and capital Roman characters for different host and vector infection states.Blue components form the baseline model and are also included in the other two model variants.The pink boxes and arrows are only found in the animal model and the green box and arrows are only in the asymptomatic model variant.Transmission pathways are shown as dashed grey lines.Births and deaths are included but are not shown here to aid readability.The grey oval and dashed black lines indicate infection classes assumed to be detectable using a traditional screen-confirm-treat approach in active screening (although some infections may still be missed due to imperfect diagnostic sensitivity).

Improvements to active screening diagnostic specificity
The active screening diagnostic algorithm characteristics are dependent on the series of tests performed before someone can be considered a confirmed case and treatment can be administered.Except for the modelled "screen-and-treat" (S&T) strategy, all serological suspects are assumed to need a parasitological confirmation test to receive treatment due to current drug administration guidelines for either pentamidine, NECT or fexinidazole [15].
Parasitological confirmation is made through microscopy, requiring trained personnel to make the diagnosis.Visualisation of the parasite in this manner should theoretically enable perfect specificity of the confirmation test (and therefore of the full diagnostic algorithm), however, the increasing rarity of the infection means that most technicians have limited recent experience in making confirmations.In our fitting procedure, we estimate the health-zone-specific active screening specificity based on historical case reporting between 2000- [25] * The value of   was chosen to maintain constant population size in the absence of vector control interventions as discussed in [23].
* * The value of  was chosen to reflect a plausible bounce-back rate as discussed in [23]; this value has minimal impact in this study as no cessation of vector control is simulated.* * * Pupal survival computed based on a 1% per day mortality rate of pupae over 27 days.
† Whilst the teneral phenomenon is well known, the exact value for the reduction in susceptibility is unknown and likely depends on the age and nutritional status of fed flies [12].Previous modelling demonstrates that this parameter is non-identifiable in model fitting and will be highly correlated with  2 0 which is fitted [22].‡ Overdispersion for both active and passive screening was selected to match the typical variance in case detections at the health zone level as discussed in [9].2020.Whilst specificity is assumed to be very high (>99.8%) the high coverage of active screening in many health zones means that it is very likely that some false positives were incorrectly confirmed.
To combat the challenge of accurately making confirmations in the field in the context of falling cases, mobile teams in former Bandundu province now also utilise video confirmation which records the parasites for quality assurance by a second person.This technology was rolled out in Mosango health zone in 2015 and in most other health zones in the former Bandundu province in 2018, therefore we increased our modelled active screening algorithm specificity to 100% in the former province of Bandundu in the corresponding year.
In former Equateur province video tablets are not being used by active screening teams.However, we do assume that if large screenings take place and case reporting is low, there would be more careful follow-up of case confirmations.We, therefore, assume perfect active screening algorithm specificity in the former Equateur province from 2024 (the beginning of our projections).

Improvements to passive screening
We assume, based on a substantial shift in the provincial-level stage 1 to stage 2 case ratio between 2000-2016 [16], that there is an improvement in passive case detection rates in the former Bandundu province (including Mosango and Bagata health zones) [4].An improvement to passive screening was also fitted for Bominenge, Budjala and Mbaya health zones in the former province of Equateur but using different priors on the parameters involved.The priors were based on modelling of former-province level data for 2000-2012 [16], augmented with WHO HAT Atlas data to make a 2000-2016 data set.An arbitrary amount of additional variation was incorporated into the priors.Note that the province-level data contained staging which is not available in this period in the WHO HAT Atlas but which was expected to be informative for this process.
We assume that all stage 1 cases are reported, but that some of the exits from stage 2 are due to death from gHAT disease outside healthcare.In 1998 the reporting probability for an exit from stage 2 is given by , however as the exit rate from stage 2 increases this reporting probability does not stay constant, but increases (proportionally more people would be detected and treated with higher exit rates).When we compute reporting rates from stage 2 we therefore use the following:

Vector control
The function which describes the probability of a host-seeking tsetse both hitting a Tiny Target and dying as a result,   , is time-dependent (, in days) from when the targets were first deployed: and  max -the maximum daily probability of contacting a Tiny Target and dying as a result.  modifies all the bite rates  in our tsetse equations to produce an additional Tiny-Target-induced mortality for tsetse.
max is chosen such that the tsetse population after one year is at the observed/assumed percentage reduction.For this model of   this is given by  max = 0.0525 for an assumed annual tsetse population density reduction of 80%.The value 182.5 reflects twice-yearly deployments of Tiny Targets, as used in DRC [27].

Fitting procedure Model initialisation
For each model for a particular parameter set, we analytically compute pre-1998 initial conditions assuming that the system was at endemic equilibrium.In 1998 we assume that passive detection rates increased with the introduction of the CATT test and that active screening began at the same level as seen in the data for 2000.

Likelihood and Markov chain Monte Carlo
We describe the fitting procedure for all models below, with text adapted from our previous fitting publications [9,10].
Twelve parameters;  0 , ,   ,   ,   0 ,  1 , , Spec,  change ,   amp ,   amp , and  steep were fitted in all health zones for all models.A further two parameters;   and   , were fitted for the model with animal transmission and five parameters: , ,    ,    and   were fitted for the asymptomatic model.See Table B for definitions and information on prior distributions used for these parameters.The 12 baseline model parameters and two additional parameters of the model with animal transmission were first used in health zone-level analyses in DRC in Crump et al. 2021 (baseline) [9] and 2022 (with animal transmission) [10].
For fitting the model variants to case data we transform model ODE solutions (for Equations 1) into annual case reporting denoted  1 ,  2 , for active stage 1 and stage 2 and  1 ,  2 , for passive stage 1 and 2. Since we always know the stage (1 or 2) in the model simulations there is no requirement for a "U" (unknown stage) category for the model.These are computed using solutions to the ODEs for the given set of parameters aggregated across a year.
The new annual reported case incidence is either by passive detection from stage 1 for the year  : passive detection from stage 2 or by active screening from the low-risk (1) group in year and with variable active screening coverage by year, ( ) and fixed diagnostic sensitivity. 1 also contains any false positives that may be incorrectly identified from non-infected people based on the high but imperfect specificity of the active screening algorithm.We assume in DRC that all false positives would be assigned as stage 1.
The log-likelihood function used in the adaptive Metropolis-Hastings MCMC contained two terms in each year for which reported case numbers were available for each source of reported cases (active or passive screening).These were: • a beta-binomial probability that the total number of cases reported in that year for that source came from the available population (either the reported number of people actively screened for active screening or the health zone population for passive screening) with probability calculated from solving the ODE for the current set of parameters, and • a binomial probability that the reported stage 1 cases came from the total number of reported staged cases where the probability parameter again comes from the solution of the ODE.In many years staging is unknown so this part of the log-likelihood will return zero and not contribute to our calculation.In some years, we only partially know staging information.
This formulation allowed over-dispersion in the observed cases to be included, via the beta-binomial distribution, and any proportion of cases with reported disease stage to be appropriately accounted for (assuming that the reporting of staging information is independent of the disease stage).The log-likelihood function was as follows: The model takes parameterisation ,  is the data,    () and    () are the number of cases detected by passive or active screening (of stage , which may be 1, 2 or unknown, ) in year  of the data.   () and    () are the number of cases detected by passive or active screening (of stage ) in year  of the model, and () is the number of people screened in year .BetaBin(; , , ) gives the probability of obtaining  successes out of  trials with probability  of success in a single trial and overdispersion parameter .The overdispersion accounts for a larger variance than under the binomial.The probability density function of this distribution is given by: where  = (1/ − 1) and  = (1 − )/.

Sequential MCMC fitting of the asymptomatic model to health zones, with Bayesian updating of priors
There are five parameters which are unique to our model with transmission of HAT to and from asymptomatic human infections: 1. the proportion of human exposures resulting in initial blood infection (  ); 2. the self-cure rate for stage 1 blood infections (   ); 3. the self-cure rate for skin-only infections (   ); 4. the transition rate from skin-only to blood infection (); and

the relative infectiousness of skin-only infection compared to a blood infection (𝑥).
We regard these parameters as being innate, biological values and therefore assume that they should be invariant across geographies.This is in contrast to the parameters which are unique to the model with transmission to and from a non-specific animal source, which is liable to vary as a result of differences in the animal species, variability in the amount of animal and tsetse habitats and interactions between these hosts and tsetse.
The best way to fit the asymptomatic model would be to consider all health zones together in a hierarchical model, however, this would be computationally very demanding especially if using this method to fit the model to many more regions e.g.all endemic health zones in the whole of the DRC (>150) or all endemic health areas (regions of around 10,000 people) of the DRC (>1,000).We have therefore opted to perform Bayesian updating of priors such that posterior parameter distributions from health zones that are informative for these parameters provide the prior distributions used for the asymptomatic model-specific parameters in other health zones.
The procedure used is as follows: 1.All health zones were analysed with the same univariate prior distributions (Table B), five MCMC chains were run in each analysis with 1,000 posterior samples taken from each chain, to give 5,000 posterior samples overall with a minimum effective sample size of 2,500.
2. The Kullback-Liebler divergence of the realised marginal posterior distributions from the univariate, independent prior distributions,  KL ( | ), were calculated, summed, and used to assign a rank to the health zones  = 1, . . ., 5, where the health zone ranked 1 had the highest total  KL across the five parameters.
3. Health zones  = 2, . . ., 5 were re-analysed using a prior based on the posterior distribution of the five asymptomatic specific parameters from the analysis of health zone  − 1.A Gaussian mixture model (GMM, a weighted mixture of multivariate normal distributions) was fitted to the samples from the posterior distribution from the analysis of health zone  − 1 (see below).The resulting parametric distribution formed the multivariate prior for the five parameters specific to the asymptomatic model in the analysis of health zone .

GMM fitting
For the MCMC analysis of health zone , the 5,000 posterior samples from the analysis of HZ  − 1 were read in.Each of the five asymptomatic model-specific parameters was first transformed to a (−∞, +∞) scale and then to have a mean of zero and a standard deviation of one.The transformation to convert to a (−∞, +∞) scale and the centring and scaling values are retained to create the Jacobian for the scaled GMM prior.
A Gaussian Mixture Model was fitted to the transformed parameters using the MATLAB routine fitgmdist.
Akaike's Information Criterion (AIC) was used to select the number of Gaussian distributions to include in the mixture.
A MATLAB structure was created containing the final GMM, identifiers for the parameters, and transformation information and this was saved to act as the multivariate five-parameter prior for the asymptomatic modelspecific parameters in the analysis of the  th health zone.

Model evidence and ensemble model
This section has been adapted slightly from the Supplementary Information of Antillon et al. [2].
The results from the stochastic projections for the three model variants were combined into an ensemble model, with the proportion taken from each model based on the relative model evidence.The model evidence, or marginal likelihood, for each model, was estimated using an importance sampled estimator [28].The relative model evidence for the  th model was then the model evidence for model  divided by the sum of the model evidences across the three models.
By use of MCMC methods to investigate the posterior distribution of the parameters, calculation of (x|m) is avoided.Calculation of the evidence for use in model comparison requires computing the integral: Equation 15 cannot be calculated analytically except for some small set of tractable models.It can, however, be rewritten as equation 16, where (  ) is a   -dimensional probability density function.From this, an importance sampled estimator of (x|m) is: where the  , are  samples drawn from .
A defence mixture [14] was used for (  ): where (•) is a mixture of  multivariate Gaussian distributions with vectors of means   (  = {1 . . .}), and covariance matrices C  ,  *    is the Jacobian transformation relating probability on transformed and original scales, and  is a mixing proportion ( = 0.95 was chosen for use, being a typical value [28]).
In each of our health-zone-level MCMC analyses of the models with and without animal transmission, 2 000 samples from the joint posterior distribution were generated and stored, and    ; ,  1 . . .  , C 1 . . .C  for each health zone and model was chosen using the Matlab fitgmdist function, selecting  based on Akaike's Information Criterion (AIC).To account for the high correlations between some of our model parameters, regularisation was applied to ensure that the covariance matrices, C  , would be positive semi-definite.Before passing to fitgmdist, transformations were applied to the posterior samples to put them in the range (−∞, ∞) -appropriate for Gaussian distributions -followed by scaling and centring to keep the regularisation consistent across analyses, at least at the simple, single overall covariance matrix level.
Having defined (•) for a given analysis (health zone, model combination), P was calculated (equation 17) using  = 10 000 samples drawn from (  ).Note that this is an increase from the 2 000 samples used in a previous study [10], and is expected to reduce the possible influence of sampling on the model evidence results.

Model fitting -posteriors
For the model with asymptomatic human transmission, we obtained two realised posterior distributions for each of the five parameters that are specific to this model; one from the initial run in which the parameters all had the same prior distributions and one from the sequential Bayesian updating (SBU)run, in which a multivariate prior based on the realised joint posterior of the previous health zone in the sequence was used.Figure 7 shows these for each of our five health zones.Bominenge, being the health zone with the highest total Kullback-Liebler divergence, did not need to be re-run in the SBU sequence and hence only has one set of posterior distributions.It is apparent that there was little information in any of the health zones for the transition rate from skin-only to blood infection () or the self-cure rate for stage 1 blood infections   neither diverges from the prior in any health zone in either analysis run.The impacts of the SBU were to shift the distributions of the proportion of infections resulting in initial blood infections (  ) and the self-cure rate for skin-only infections (   ) towards higher values, and to reduce the relative infectiousness of skin-only infections compared to blood infections.

Model fitting -time series plots
The following graphs show a comparison of our model outputs to data and include the estimated new human infection incidence.EoT by each year, calculated by taking the number of realisations where there are no new infections to humans in or after that year until the end of the simulation and diving by the total number of realisations (20,000 for the baseline and animal models and 50,000 for the asymptomatic model) The ensemble results are given by the orange curve which is computed as the weighted average of the probability of EoT from the individual model variants to avoid statistical sampling variation.As per Table 2of the main text , for the second strategy with vector control (VC), we assume this novel intervention begins in 2024, and for the third strategy using screening-and-treat (S&T), we assume this novel intervention begins in 2028.

NTD PRIME criteria
To fulfil recommendations set out for good modelling practises for policy, we itemise below how each of the five key principles relating to communication, quality and relevance of analyses -known as Policy-Relevant Items for Reporting Models in Epidemiology of Neglected Tropical Diseases (PRIME-NTD) [3] have been met in Table F. See Materials and Methods section in the main text, Supplementary Information (file S1) and at OSF (https://osf.io/73ytc/)

Complete description of data used
The original data and how we aggregated the data for fitting were described in detail in the SI and also other model fitting papers for DRC from our group including a recent update.

Communicating uncertainty
Structural uncertainty: The focus of this paper is to assess statistical support for three alternative gHAT model variants -one with no cryptic infections, one with animal transmission and one with self-curing asymptomatic humans.In previous studies, eight variants were considered [22,17] and there was the highest support for "Model 4" (called baseline in the present study) and "Model 7" (animal transmission) which are examined here.The asymptomatic model has not been fitted or statistically compared previously.We show both fits and projections side by side to show model differences driven by this structural uncertainty  Where in the manuscript is this described?
Prediction uncertainty: All predictions incorporate structural and parameter uncertainty.We show three strategies.

Testable model outcomes
Previous versions of this model have undergone validation exercises (data censoring) to examine the robustness of the predictive ability of the model [17,21].Whilst this was not performed here, the model is an updated version of those that have undergone validation, with updates based on the critical review of model fits as data for different regions or time periods (notable refinements also included here were made in the related analysis of Antillon et al. [2]).As more years of data become available in the future the model outputs shown here will be able to be compared to reported active and passive case data to test model predictions.
Main text and SI figures

Figures 8 -
show the realised posterior distributions of all the parameters fitted across the three model variants and five health zones.The parameters of the baseline model do not change greatly when fitted in either of the cryptic transmission models.

Fig 7 :
Fig 7: Posterior distributions of asymptomatic model specific parameters before and after sequential Bayesian updating(SBU).Histograms of posterior samples from before and with SBU updating and their prior distributions.Rows are ordered by total Kullback-Liebler divergence, hence Bominenge only has pre-SBU outputs (see TableC).

Fig 8 :Fig 9 :Fig 10 :Fig 11 :Fig 12 :
Fig 7: Posterior distributions of asymptomatic model specific parameters before and after sequential Bayesian updating(SBU).Histograms of posterior samples from before and with SBU updating and their prior distributions.Rows are ordered by total Kullback-Liebler divergence, hence Bominenge only has pre-SBU outputs (see TableC).

Fig 13 :Fig 14 :Fig 15 :Fig 16 :Fig 17 :Fig 18 :Fig 19 :Fig 20 :Fig 21 :Fig 22 :Fig 23 :Fig 24 :Fig 25 :Fig 27 :Fig 28 :Fig 29 :Fig 31 :Fig 32 :Fig 33 :
Fig 13:  Comparison of fits in Bominenge.The deterministic model was used to perform fitting and sampling was conducted by using the stochastic model with the fitted posterior distributions.Blue, pink and green box and whisker plots show the baseline model, model with animal transmission and asymptomatic model fits respectively.The orange boxes represent the ensemble model outputs.The central line of each box is the median, the box is the 50% credible interval (CI) and the whiskers show the 95% CI.Case data are shown as a black line.New infections are estimated through the model fit, however, there is no way to directly observe this so there are no corresponding observational data.

Table A :
Model parameterisation (fixed parameters).Notation, a brief description, and the values used for fixed parameters.

Table B :
Model parameterisation (fitted parameters).Notation, brief description, and information on the prior distributions for fitted parameters.

Table C :
Approximate Kullback-Liebler divergence between the prior and posterior distributions for each of the parameters specific to the asymptomatic human transmission model and their total, which was used to decide the order of running the sequential Bayesian updating.

Table E :
Relative model evidence (%) after sequential Bayesian updating of the asymptomatic human transmission model -weights in the ensemble model.See main text Figure 5.

Table F :
PRIME-NTD criteria fulfillment.How the NTD Modelling Consortium's "5 key principles of good modelling practice" have been met in the present study.This study is one in a series of modelling analyses for the DRC.The modelling team led the simulation and analysis work guided by members of the national sleeping sickness control programme in DRC (PNLTHA-DRC)coauthors E Mwamba Miaka and S Chancy.PNLTHA-DRC have supported model development through the context of how the human case data used in the study were collected and how this has changed over time (via in-person meetings, online meetings and by email).Full model fitting code and documentation are available through OpenScienceFramework (OSF).The model is fully described in the main text and SI.

Table F -
continued from previous page Principle What has been done to satisfy the principle?